Synchronization, Coherence, and Consistency for High Performance Shared-memory Multiprocessing Synchronization, Coherence, and Consistency for High Performance Shared-memory Multiprocessing
نویسندگان
چکیده
Although improved device technology has increased the performance of computer systems , fundamental hardware limitations and the need to build faster systems using existing technology have led many computer system designers to consider parallel designs with multiple computing elements. Unfortunately, the design of eecient and scalable multiprocessors has proven to be an elusive goal. This dissertation describes a hierarchical bus-based multiprocessor architecture, an adaptive cache coherence protocol, and eecient and simple synchronization support that together meet this challenge. We have also developed an execution-driven tool for the simulation of shared-memory multiprocessors, which we use to evaluate the proposed architectural enhancements. Our simulator ooers substantial advantages in terms of reduced time and space overheads when compared to instruction-driven or trace-driven simulation techniques, without signiicant loss of accuracy. The simulator generates correctly interleaved parallel traces at run time, allowing the accurate simulation of a variety of architectural alternatives for a number of programs. Our results provide a quantitative analysis of the viability of large-scale bus-based memory hierarchies. We evaluate the eeect on performance of several architectural enhancements, and discuss the tradeoos between reducing contention and increasing latency as the number of levels in the memory hierarchy are increased. Toward this end, we have developed iii a cache coherence protocol for a hierarchical bus-based architecture that minimizes total communication overhead by utilizing all available (bus-provided) information. Based on our evaluation, we propose an integrated set of architectural design decisions. These include synchronization using a conditional test&set operation that eliminates excess bus traac and contention, conditional access scheduling, where bus traac is reduced by keeping track of pending bus accesses for every cache line, adap-tive caching, where each cache line is assigned a coherence protocol based upon the expected or observed access behavior for that line, and the use of relaxed memory consistency models, where writes are aggressively buuered. We also present a new classiication of memory consistency models that, in addition to unifying all existing models into a common framework, provides insight into the implications of these models with respect to access ordering. Acknowledgments I would like to express my sincere thanks to the members of my committee, Dr. during the course of this research have been invaluable. I am grateful for their patient reading and incisive comments on the writing of this thesis, without which I would still be writing. My oocemates Bill Foundoulis and Rajat Mukherjee, have made these years as …
منابع مشابه
Toward Large Scale Shared Memory Multiprocessing
We are currently investigating two di erent approaches to scalable shared memory Munin a distributed shared memory DSM system implemented entirely in software and Willow a true shared memory multiprocessor with extensive hardware support for scalability Munin allows parallel programs written for shared memory multiprocessors to be executed e ciently on dis tributed memory multiprocessors Unlike...
متن کاملKIMP: Multicheckpointing Multiprocessors
Multiprocessors are coming into wide-spread use in many application areas, yet there are a number of challenges to achieving a good tradeoff between complexity and performance. For example, while implementing memory coherence and consistency is essential for correctness, efficient implementation of critical sections and synchronization points is desirable for performance. The multi-checkpointin...
متن کاملConcord: Re-Thinking the Division of Labor in a Distributed Shared Memory System
A distributed shared memory system provides the abstraction of a shared address space on either a network of workstations or a distributed-memory multiprocessor. Although a distributed shared memory system can improve performance by relaxing the memory consistency model and maintaining memory coherence at a granularity speci ed by the programmer, the challenge is to o er ease of programming whi...
متن کاملA New Relaxed Memory Consistency Model for Shared-Memory Multiprocessors with Parallel-Multithreaded Processing Elements
The release consistency model is the generally accepted hardware-centric relaxed memory consistency model because of its performance and implementation complexity. By extending the release consistency model, in this paper, we propose a hardware-centric memory consistency model particularly for shared-memory multiprocessor systems with parallel-multithreaded processing elements. The new model us...
متن کاملFormal Verification of Delayed Consistency Protocols
In a cache-coherent, shared-memory multiprocessor system, data consistency among cached copies can be delayed until synchronization points under relaxed memory consistency models. Some protocols called delayed consistency protocols take advantage of this flexibility to reduce cache miss rates and memory traffic. However, they are very complex and validating their correctness, even at the behavi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1992